Identifying Interesting Instances for Probabilistic Skylines
نویسندگان
چکیده
Uncertain data arises from various applications such as sensor networks, scientific data management, data integration, and location based applications. While significant research efforts have been dedicated to modeling, managing and querying uncertain data, advanced analysis of uncertain data is still in its early stages. In this paper, we focus on skyline analysis of uncertain data, modeled as uncertain objects with probability distributions over a set of possible values called instances. Computing the exact skyline probabilities of instances is expensive, and unnecessary when the user is only interested in instances with skyline probabilities over a certain threshold. We propose two filtering schemes for this case: a preliminary scheme that bounds an instance’s skyline probability for filtering, and an elaborate scheme that uses an instance’s bounds to filter other instances based on the dominance relationship. We identify applications where instance-level filtering is useful and desirable. Our algorithms can be easily adapted to filter at the object level if the application domain requires it. Moreover, the uncertain model we adopt in this paper allows missing probabilities of uncertain objects as well as arbitrary probability distributions over instances. We experimentally demonstrate the effectiveness of our filtering schemes on both the real NBA data set and the synthetic data set.
منابع مشابه
Probabilistic Skylines on Uncertain Data
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic...
متن کاملContinuous Probabilistic Skyline Queries over Uncertain Data Streams
Recently, some approaches of finding probabilistic skylines on uncertain data have been proposed. In these approaches, a data object is composed of instances, each associated with a probability. The probabilistic skyline is then defined as a set of non-dominated objects with probabilities exceeding or equaling a given threshold. In many applications, data are generated as a form of continuous d...
متن کاملSemi-Skylines and Skyline-Snippets
Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...
متن کاملUNIVERSITÄT AUGSBURG Semi-Skylines and Skyline Snippets
Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...
متن کاملEfficient Skyline Computation in MapReduce
Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...
متن کامل